Annotation Semantique de Documents Semi-Structurés pour la recherche d'information. (Semantic Annotation of Semi-structured Documents for Information Retrieval)
نویسنده
چکیده
The semantic web is defined by a set of methods and technologies enabling softwareagents to reason about the contents of Web resources. This vision of the Web depends onthe construction of ontologies and the use of metadata to represent these resources. Theobjective of our thesis is to annotate semantically tagged documents related to a domainof interest. These documents may contain well-structured nodes and textual ones. Weassume having a domain ontology defined by concepts, relations between these conceptsand their properties. This ontology includes a lexical component (labels, a set of namedentities (NE) and terms) for each concept. We have defined an automatic and domainindependent approach SHIRI-Extract that extracts terms and NE and aligns them withthe concepts of the ontology. The alignment uses the lexical component or the Web todiscover new terms. We have defined an annotation model which represents the results ofextraction and annotation. The metadata of this model distinguish nodes depending on
منابع مشابه
Une métrique pondérée pour la recherche textuelle d'images dans des documents semi-structurés
The birth of the XML standard and the growing use of images in electronic documents raised an open issue in information retrieval: image retrieval in semi-structured documents. This article presents a method to evaluate a semantic representation of images using the text and the document structure. More precisely, we propose a measure that evaluates the participation of each element of the docum...
متن کاملAnnotation sémantique des ressources Web : Etat de lart et perspectives de recherche
The semantic annotation problem of Web resources interest researchers from different communities to improve the information retrieval process. Indeed, semantic annotation of documents and Web pages is a hard work. An automation of construction process is essential to modernize information retrieval in the Semantic Web. In this paper, we present a new semantic annotation approach of Web resource...
متن کاملRecherche d'information structurée. Vers un modèle possibiliste pour la recherche d'information dans des documents structurés
In this paper, we are interested in Information Retrieval in structured document in XML. For this, we present a model for the structured information retrieval, based on the possibilistic networks. The document elements and elements terms relations are modelled by measures of possibility and necessity. In this model, the user's request starts a process of propagation to recover the documents or ...
متن کاملRecherche approchée d'information dans une base de documents semi-structurés
RÉSUMÉ. Nous proposons des algorithmes dédiés à l'indexation et à la recherche approximative d'information dans les bases de données hétérogènes semi-structurées XML. Le modèle d'indexation proposé est adapté à la recherche de contenu textuel dans les contextes XML définis par les structures d'arbres. Les mécanismes de recherche approchée mis en œuvre s’appuient sur une distance de Levenshtein ...
متن کاملIndexation relationnelle pour la recherche de documents structurés interreliés
In information retrieval on classical structured documents, one problem consists in browsing the result space using the structure of the documents. Taking into account other links between doxels increases this problem. In this article, we consider relative exhaustivity and relative specificity values computed on non compositional linked doxels to index the corpus ; adding this information to th...
متن کامل